(Continuted from PR #86) Initial RDNA Windows bring-up for CK FMHA##87
Open
jammm wants to merge 10 commits into
Open
(Continuted from PR #86) Initial RDNA Windows bring-up for CK FMHA##87jammm wants to merge 10 commits into
jammm wants to merge 10 commits into
Conversation
Default RDNA CK FMHA builds to round-to-nearest bf16 conversion so CK forward payloads remain accurate when FlashAttention backward consumes them. This fixes the remaining bf16 RDNA failures without adding skips or loosening tolerances.
gfx11 WMMA duplicates Q data across subgroups, so the 128x256 Q tile can exceed static_for's 256-iteration limit. Use M=64 for the hdim-256 common forward tile on gfx11 while keeping the existing tile on gfx12 and other architectures.
Point the vendored CK submodule at the review-updated RDNA FMHA changes and keep the async infer pipeline out of gfx11/gfx12 xFormers builds. The async global-to-LDS CK path is not used for current RDNA3/4 FMHA builds and fails to compile when instantiated after the core CK fallback was removed.
|
Now that ROCm/rocm-libraries#7016 has been approved, theoretically, once it goes in, this one could too, right? Could you please review the changes here, @qianfengz? |
Author
The PR needs to be merged first, then we update the submodule here before it's ready to be merged. |
…na-fmha # Conflicts: # third_party/composable_kernel_tiled # xformers/csrc/attention/hip_fmha/ck_tiled_rand_uniform_kernel.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-posted from #86 (comment)
reviews and final PR merge should go from here.
Needs ROCm/rocm-libraries#7016 to be merged first,
cc @qianfengz @0xDELUXA